A Logic Programming Approach to Behaviour Recognition 
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Abstract 

We present a system for recognising human behaviour 
given a symbolic representation of surveillance videos. The 
input of our system is a set of time-stamped short-term be- 
haviours, that is, behaviours taking place in a short pe- 
riod of time — walking, running, standing still, etc — de- 
tected on video frames. The output of our system is a set 
of recognised long-term behaviours — fighting, leaving an 
object, collapsing, etc — which are pre-defined temporal 
combinations of short-term behaviours. We develop a logic 
programming implementation of the Event Calculus to ex- 
press the constraints on the short-term behaviours that, if 
satisfied, lead to the recognition of a long-term behaviour. 
We present experimental results concerning videos with sev- 
eral humans and objects, temporally overlapping and repet- 
itive behaviours. Moreover, we show that our approach, 
which is not restricted to video surveillance, is better suited 
to a class of recognition applications than state-of-the-art 
recognition systems. 



^ 1 Introduction 



We address the problem of human behaviour recognition 
by separating low-level recognition from high-level recog- 
nition. The output of the former type of recognition is a set 
of activities taking place in a short period of time — 'short- 
term behaviours'. The output of the latter type of recogni- 
tion is a set of 'long-term behaviours', that is, pre-defined 
temporal combinations of short-term behaviours. We focus 
on high-level recognition. 

To perform high-level recognition we define a set of 
long-term behaviours of interest — for example, 'leaving 
an object', 'fighting' and 'meeting' — as temporal com- 
binations of short-term behaviours — for instance, 'walk- 
ing', 'running', 'inactive' (standing still) and 'active' (body 
movement in the same position). We employ a logic pro- 
gramming implementation of the Event Calculus (EC) ||9l, 
a temporal reasoning formalism, in order to express the defi- 



nition of a long-term behaviour. More precisely, we employ 
EC to express the temporal constraints on a set of short- 
term behaviours that, if satisfied, lead to the recognition of 
a long-term behaviour. 

We evaluate our approach using an existing dataset of 
short-term behaviours detected on a series of surveillance 
videos. We recognise five long-term behaviours achieving 
high Recall and Precision rates. 

Our behaviour recognition system falls under the cate- 
gory of symbolic scenario recognition systems (SSRS) — 
see, for example. El ID [IT] [TS] [H [HI. These systems 
accept as input a stream of time-stamped 'low-level' (or 
'primitive') events, which are used to recognise 'high-level' 
(or 'derived') events of interest. The definition of a high- 
level event, included in a SSRS, imposes a set of constraints 
on a set of subevents. We present a comparison of our be- 
haviour recognition system with state-of-the-art SSRS. We 
show that a logic programming approach is better suited to 
a class of recognition applications, including video surveil- 
lance, rather than purely temporal reasoning SSRS. More- 
over, we show the advantages of our approach with respect 
to a state-of-the-art SSRS based on a logic programming 
implementation of an EC dialect. 

The remainder of the paper is organised as follows. First, 
we present the Event Calculus. Second, we describe the 
dataset of short-term behaviours on which we perform long- 
term behaviour recognition. Third, we present our knowl- 
edge base of long-term behaviour definitions. Fourth, we 
present our experimental results. Finally, we discuss related 
work and outline directions for further research. 

2 The Event Calculus 

Our system for long-term behaviour recognition (LTBR) 
is a logic programming implementation of an Event Cal- 
culus formalisation, expressing long-term behaviour defini- 
tions. The Event Calculus (EC), introduced by Kowalski 
and Sergot is a formalism for representing and reason- 
ing about actions or events and their effects. We present 



Table 1 . Main Predicates of the Event Calcu- 
lus. 



Predicate 

happens(^ct, T) 
initially(f = V) 

holdsAt(f = V, T) 

holdsFor(i^= V, I) 

initiates(^rf,f = V, T) 

terminates(^ct,F= V, T) 



Meaning 

Action Act occurs at time T 

The value of fluent 
Fx^V at time 

The value of fluent 
is y at time T 

The value of fluent F \s,V 
during intervals / 

The occurrence of action Act 
at time T initiates a period of 
time for which the value of 
fluent FxiiV 

The occurrence of action Act 
at time T terminates a period 
of time for which the value of 
fluent FiaV 



here the version of the EC that we employ (more details 
about this version may be found in |3l). 

EC is based on a many-sorted first-order predicate calcu- 
lus. For the version used here, the underlying time model is 
linear and it may include real numbers or integers. Where 
F is a. fluent — a property that is allowed to have different 
values at different points in time — the term F = V denotes 
that fluent F has value V . Boolean fluents are a special case 
in which the possible values are true and false. Informally, 
F = V holds at a particular time-point if F has been 
initiated by an action at some earlier time-point, and not 
terminated by another action in the meantime. 

An action description in EC includes axioms that define, 
among other things, the action occurrences (with the use of 
the happens predicate), the effects of actions (with the use 
of the initiates and terminates predicates), and the values of 
the fluents (with the use of the initially, holdsAt and holdsFor 
predicates). Table [T] summarises the main EC predicates. 
Variables (starting with an upper-case letter) are assumed to 
be universally quantified unless otherwise indicated. Pred- 
icates, function symbols and constants start with a lower- 
case letter. 

The domain-independent definition of the holdsAt predi- 
cate is as follows: 

holdsAt( F== T ) <- 

initially( F= F ), (1) 
notbroken( F= V, 0, T) 

holdsAt( F= V, T )^ 

happens( Act, T' ), T' < T, 

initiates( Act, F^V, T' ), ^ ' 

notbroken( F= V, T', T ) 



According to axiom ([T]) a fluent holds at time T if it held 
initially (time 0) and has not been 'broken' in the meantime, 
that is, terminated between times and T. Axiom (|2]i spec- 
ifies that a fluent holds at a time T if it was initiated at some 
earlier time T' and has not been terminated between T' and 
T. 'not' represents 'negation by failure' |,4J. The auxiliary 
predicate broken is defined as follows: 

broken( F= V, Tj , T3 ) 
happens( Act, T2 ), 
Ti < T2, T2 < T3, 
terminates( Act, F^V, T2) 

F = V is 'broken' between Ti and Tj, if an event takes 
place in that interval that terminates F = V . Note that, ac- 
cording to the above axioms, a fluent does not hold at the 
time that was initiated but holds at the time it was termi- 
nated. 

A fluent cannot have more than one value at any time. 
The following axiom captures this feature: 

terminates( Act, F^V, T) ^ 

initiates( Act, F= V , T ), (4) 
V 

Axiom (in states that if an action Act initiates F = V' 
then Act also terminates F — V , for all other possible val- 
ues V of the fluent F. We do not insist that a fluent must 
have a value at every time-point. In this version of EC, 
therefore, there is a difference between initiating a Boolean 
fluent F = false and terminating F = true: the first implies, 
but is not implied by, the second. 

The intervals in which a fluent holds are computed with 
the use of the holdsFor predicate. Below is a skeleton of this 
predicate: 

holdsFor( F = V, I) ^ 

start( F= V, StartPts ), 

end( F== V, EndPts ), ^ ' 

computeJntervals( StartPts, EndPts, I ) 

The start predicate computes a list of time-points in 
which F = V is initiated. If F held initially then the 
output of start includes 0. The end predicate computes a 
list of time-points in which F = V is terminated. Given the 
output of these predicates, computeJntervals computes the 
maximal intervals of time-points for which F = V holds 
continuously. The computed intervals are of the form 
{Ti, T2] or since(T). To save space we do not present here 
the complete formalisation of holdsFor. 

3 A Dataset of Short-Term Behaviours 

LTBR includes an EC action description expressing 
long-term behaviour definitions. The input to LTBR is a 



symbolic representation of short-term behaviours. The out- 
put of LTBR is a set of recognised long-term behaviours. 
In in we used the first dataset of the CAVIAR projecQ to 
perform long-term behaviour recognition. This dataset in- 
cludes 28 surveillance videos of a public space. The videos 
are staged — actors walk around, browse information dis- 
plays, sit down, meet one another, leave objects behind, 
fight, and so on. Each video has been manually annotated 
in order to provide the ground truth for both short-term and 
long-term behaviours. The CAVIAR dataset includes the 
following short-term behaviours: walking, running, active 
and inactive. We recognised the following long-term be- 
haviours: a person leaving an object, a person being immo- 
bile, people meeting, moving together, or fighting. 

Due to the absence of a short-term behaviour for abrupt 
motion, the accuracy of the recognition of the long-term be- 
haviours meeting, moving and fighting was compromised. 
It was often impossible to distinguish between these be- 
haviours — for instance, the short-term behaviours of peo- 
ple fighting were often classified as walking or active (due 
to the absence of an abrupt motion short-term behaviour), 
leading, in certain conditions, to the recognition of fight- 
ing and meeting, or fighting and moving. To overcome this 
problem, we introduced in the CAVIAR dataset a short-term 
behaviour for abrupt motion: we manually edited the anno- 
tation of the CAVIAR videos by changing, when necessary, 
the label of a short-term behaviour to 'abrupt motion'. A 
person is said to exhibit an abrupt motion behaviour if she 
moves abruptly and her position in the global coordinate 
system does not change significantly — if it did then her 
short-term behaviour would be classified as running. A def- 
inition of abrupt motion and a description of a system de- 
tecting this type of short-term behaviour may be found in 
im, for example. 

For this set of experiments, therefore, the input to LTBR 
is: (i) the short-term behaviours walking, running, active, 
inactive and abrupt motion, along with their time-stamps, 
that is, the frame in which a short-term behaviour took 
place, (ii) the coordinates of the tracked people and ob- 
jects as pixel positions at each time-point, and (iii) the first 
time and the last time a person or object is tracked ('ap- 
pearsV'disappears'). Given this input, LTBR recognises the 
following long-term behaviours: a person leaving an object, 
a person being immobile, people meeting, moving together, 
or fighting. 

Short-term behaviours are represented as EC actions 
whereas the long-term behaviours that LTBR recognises 
are represented as EC fluents. In the following section we 
present example fragments of the long-term behaviour def- 
initions. The complete definitions are available with the 
source code of LTBR. 



4 Long-Term Behaviour Definitions 



The long-term behaviour 'leaving an object' is defined 
as follows: 

initiates( inactive{Object), 

leaving_object{Person, Object) =1^116, T ) <— 
holdsAt( appearance{Object) = appear, T ), 
ho\6sM{ close{Person, Object, 50)= true, T ), 
holdsAt( appearance{Person) — appear, Tg ), 
To<T 

(6) 

initiates( exit{Object), 

leaving _object [Per son, Object) —fa\se, T) 



(7) 



http : / / groups . inf . ed . ac . uk/ vis ion /CAVIAR/ 
CAVIARDATAl/ 



Axiom (|6]l expresses the conditions in which a 'leaving 
an object' behaviour is recognised. The fluent recording 
this behaviour, leaving _object(Person, Object), becomes 
true at time T if Object is inactive at T, Object 'appears' 
at T, there is a Person close to Object at T (in a sense to 
be specified below), and Person has appeared at some time 
earlier than T. The appearance fluent records the times 
in which an object/person 'appears' and 'disappears'. The 
close{A, B, D) fluent is true when the distance between A 
and B is at most D. The distance between two tracked ob- 
jects/people is computed given their coordinates. Based on 
our empirical analysis the distance between a person leav- 
ing an object and the object is at most 30 pixel positions. 

An object exhibits only inactive short-term behaviour 
Any other type of short-term behaviour would imply that 
what is tracked is not an object. Therefore, the short- 
term behaviours active, walking, running and abrupt motion 
do not initiate the leaving _object fluent. In the CAVIAR 
videos an object carried by a person is not tracked — 
only the person that carries it is tracked. The object will 
be tracked, that is, 'appear', if and only if the person 
leaves it somewhere. Consequently, given axiom (|6|, the 
leaving .object behaviour will be recognised only when a 
person leaves an object (see the third line of axiom (|6]l), not 
when a person carries an object. 

Axiom (|7]i expresses the conditions in which a 
leaving .object behaviour ceases to be recognised. In 
brief, leaving .object is terminated when the object in 
question is picked up. exit{A) is an event that takes 
place when appearance{A) ~ disappear. An object that 
is picked up by someone is no longer tracked — it 'disap- 
pears' — triggering an exit event which in turn terminates 
leaving .object . 

The long-term behaviour immobile was defined in order 
to signify that a person is resting in a chair or on the floor, 
or has fallen on the floor (fainted, for example). Below is 



an axiom of the immobile definition: 



initiates( inactive{Person), 

immobile{Person) — true, T ) ^ 

duration( inactive{Person) , Intervals ), 
{T, Tj ) G Intervals, 

Ti > T+54, (8) 

findall( S, shop{S), Shops ), 

far{ Person, Shops ), 

happens( active{Person), Tg ), 

To<T 



According to axiom ([8]), 
immobile(Person) is recognised if: 



the behaviour 



• Person stays inactive for more than 54 frames (see 
Unes 3-5 of axiom (O). We chose this number of 
frames given our empirical analysis of the CAVIAR 
dataset. duration is a predicate computing the duration 
of inactive behaviour given the number of consecutive 
instantaneous inactive events. The output of duration is 
a set of the maximal intervals in which a person/object 
is inactive. (holdsFor computes the duration of fluents 
and thus cannot be used for computing the duration of 
inactive behaviour.) Note that this is not the only way 
to represent durative events in EC. See |16| for alter- 
native representations. 

• Person is not close to an information display or a 
shop (see lines 6-7 of axiom (O). If Person was 
close to a shop then she would have to stay inactive 
much longer than 54 frames before immobile could 
be recognised. In this way we avoid classifying the be- 
haviour of browsing a shop as immobile. far{P, List) 
is an atemporal predicate that becomes true when P is 
far from every element of the List. 

• Person has been active some time in the past (see lines 
8-9 of axiom (O). The definition of immobile in- 
cludes axioms requiring that Person has been walking 
some time in the past (in addition to the above con- 
straints). We insist that Person in immobile{Person) 
has been active or walking before being inactive in or- 
der to distinguish between a left object, which is inac- 
tive from the first time it is tracked, from an immobile 
person. 

immobile(Person) is terminated when Person starts 
walking, running or 'disappears' — see axioms dgb-lfTTTl: 



initiates( running[Person), 

immobile{Person) = false, T ) 

initiates( exit{Person) , 

immobile{Person) — false, T ) 



(10) 
(11) 



In a similar way we may express the definitions of other 
long-term behaviours. It is not difficult to see that the use 
of EC, in combination with the full power of logic program- 
ming, allows us to express behaviour definitions including 
complex temporal, spatial or other constraints. Below we 
present fragments of the remaining behavior definitions of 
LTBR's knowledge base. 

The following axioms represent a fragment of the 
moving behaviour definition: 



initiates( walking (Person), 

moving {Person, Person^) = true, T) 
holdsAt( close{Person, Person2, 34) 
happens{ walking(Person2), T) 



true, T 



(12) 



(15) 



(16) 



(17) 



initiates( walking{Person) , 

immobile(Person) = false, T ) 



(9) 



initiates( walking(Person) , 

moving(Person, Person^) = false, T ) <— 

holdsAt( dose (Person, Persons, 5^)=false, T) 

(13) 

initiates( active{Person), 

moving(Person, Persons) ^ false, T)^ (14) 
happens( active{ Persons ), T ) 

initiates( running(Person) , 

moving(Person, Persons) —fa\se, T) 

initiates( abrupt{Person), 

moving{Person, Persong) = false, T) 

initiates( exit{Person), 

moving{Person, Persong) = false, T) 

According to axiom (fT2b . moving is initiated when two 
people are walking and are close to each other (their dis- 
tance is at most 34 pixel positions), moving is terminated 
when the people walk away from each other, that is, their 
distance becomes greater than 34 pixel positions (see ax- 
iom (fTSTl), when they stop moving, that is, become active 
(see axiom ( fT4b ) or inactive, when one of them starts run- 
ning (see axiom (fTsTi). moving abruptly (see axiom (IT6b ). or 
'disappears' (see axiom ([TT])). 

meeting is recognised when the following conditions are 
satisfied: 

initiates( active{Person), 

meeting{Person, Persong) = true, P ) «— 

holdsAt( c/ose(Person, Persons, ^5) = true, T), 
not happens( running[Persons) , T ), 
nothappens( abrupt {Per sons), T ) 

(18) 



initiates( inactive{Person), 

meeting{Person, Persong) = true, T)^ 

holdsAt( cZose(Person, Person2, 25)—\x\ie, T ), 
not happens( running{Person2) , T ), 
not happens( abrupt{Person2), T ) 

(19) 

meeting is initiated when two people 'interact': at least 
one of them is active or inactive, the other is neither run- 
ning nor moves abruptly, and the distance between them is 
at most 25 pixel positions. This interaction phase can be 
seen as some form of greeting (for example, a handshake). 
meeting is terminated when the two people walk away from 
each other, or one of them starts running, moving abruptly 
or 'disappears'. The axioms representing the termination of 
meeting are similar to axioms ( fTSI l and (flSll-lfTTli. 

Note that meeting may overlap with moving: two peo- 
ple interact and then start moving, that is, walk while being 
close to each other In general, however, there is no fixed 
relationship between meeting and moving. 

The last definition of LTBR's knowledge base concerns 
the fighting behaviour — the axiom below presents the con- 
ditions in which fighting is initiated: 

initiates( abrupt{Person), 
fighting{Person, Persong) = true, T)^ 

holdsAt( c/ose(Perso7i, Person2, ^^)=true, T ), 
not happens( inactive{Person2), T ) 

(20) 

Two people are assumed to be fighting if at least one 
of them is moving abruptly, the other is not inactive, and 
the distance between them is at most 24 pixel positions. 
fighting is terminated when one of the people walks or runs 
away from the other, or 'disappears'. 

5 Experimental Results 

We present our experimental results on 28 surveillance 
videos of the CAVIAR project. These videos contain 26419 
frames that have been manually annotated in order to pro- 
vide the ground truth for short-term and long-term be- 
haviours. We edited the original annotation of the CAVIAR 
videos by introducing a short-term behaviour for abrupt mo- 
tion. (The edited annotation of the CAVIAR videos is avail- 
able with the source code of LTBR.) Table |2] shows the 
performance of LTBR — we show, for each long-term be- 
haviour, the number of True Positives (TP), False Positives 
(FP) and False Negatives (FN), as well as Recall and Preci- 
sion. 

LTBR correctly recognised 4 leaving .object behaviours. 
Moreover, there were no FP. On the other hand, there was 
1 FN. This, however, cannot be attributed to LTBR because 
in the video in question the object was left behind a chair 



Table 2. Experimental Results. 



Behaviour 


TP 


FP 


FN 


RecaU 


Precision 


leaving object 


4 





1 


0.8 


1 


immobile 


9 


8 





1 


0.52 


moving 


15 


3 


2 


0.88 


0.83 


meeting 


6 


1 


3 


0.66 


0.85 


fighting 


6 








1 


1 



and was not tracked. In other words, the left object never 
'appeared', it never exhibited a short-term behaviour 

Regarding immobile we had 9 TP, 8 FP and no FN. The 
recognition of immobile would be much more accurate if 
there was a short-term behaviour for the motion of lean- 
ing towards the floor or a chair. Due to the absence of 
such a short-term behaviour, the recognition of immobile 
is primarily based on how long a person is inactive. In the 
CAVIAR videos a person falling on the floor or resting in 
a chair stays inactive for at least 54 frames. Consequently 
LTBR recognises immobile if, among other things, a person 
stays inactive for at least 54 frames (we require that a per- 
son stays inactive for a longer time period if she is located 
close to a shop to avoid FP when a person is staying inactive 
browsing a shop). There are situations, however, in which 
a person stays inactive for more than 54 frames and has not 
fallen on the floor or sat in a chair: people watching a fight, 
or just staying inactive waiting for someone. It is in those 
situations that we have the FP concerning immobile. We 
expect that in longer videos recording actual behaviours (as 
opposed to the staged behaviours of the CAVIAR videos) a 
person falling on the floor or resting in a chair would be in- 
active longer than a person staying inactive while standing. 
In this case we could increase the threshold for the duration 
of inactive behaviour in the definition of immobile, thus po- 
tentially reducing the number of FP concerning immobile. 

The introduction of the abrupt motion short-term 
behaviour had no effect on LTBR's accuracy of the 
leaving .object and immobile behaviour recognition. 

LTBR recognised correctly 15 moving behaviours. 
However, it also recognised incorrectly 3 such behaviours. 
These FP mainly concern people that do move together: 
walk towards the same direction while being close to each 
other According to the manual annotation of the videos, 
however, these people do not exhibit the moving long-term 
behaviour (See 1 10] for an evaluation of the manual anno- 
tation of the CAVIAR dataset.) 

The number of FP concerning the moving behaviour 
has substantially decreased with respect to the results pre- 
sented in 1 2 1 . One reason for reducing these FP concerns the 
fact that we added a constraint to the EC action description 
of LTBR that the duration of moving exceeds a specified 



threshold. Consequently LTBR did not classify as moving 
the cases in which people happen to walk close to each other 
while moving to different directions. 

Another reason for reducing the FP regarding moving 
is the introduction of abrupt motion. In the original anno- 
tation of the CAVIAR dataset the short-term behaviours of 
people fighting were sometimes classified as walking. Con- 
sequently, the behaviour of these people was incorrectly 
recognised by LTBR as moving, since, according to the 
original annotation of the CAVIAR dataset, they are walk- 
ing while being close to each other (moreover, their coordi- 
nates changed). Labelling the short-term behaviours of peo- 
ple fighting as abrupt motion resolves this issue, because 
abrupt motion does not initiate a moving behaviour. 

LTBR did not recognise 2 moving behaviours. One FN 
was due to the fact that the distance between the people 
walking together was greater than the threshold we have 
specified. Increasing this threshold would result in substan- 
tially increasing the number of FR Therefore we chose not 
to increase it. The other FN was due to the constraint that 
we added in the EC action description of LTBR that the du- 
ration of moving should exceed a specified threshold — a 
moving behaviour took place having duration less than this 
threshold. We chose to keep this contraint nevertheless be- 
cause it substantially reduces the number of FR 

LTBR recognised 7 meeting behaviours, 6 of which took 
place and 1 did not take place. The FP was due to the fact 
that two people were active and close to each other, but 
were not interacting. LTBR did not recognise 3 meeting 
behaviours. 2 FN were due to the fact that the distance 
between the people in the meeting was greater than the 
threshold we have specified. If we increased that thresh- 
old LTBR would correctly recognise these 2 meeting be- 
haviours. However, the number of FP for meeting would 
substantially increase. Therefore we chose not to increase 
the threshold distance. The third FN was due to the fact that 
the short-term behaviours of the people interacting — hand- 
shaking — were classified as walking (although one of them 
was actually active). We chose to specify that walking does 
not initiate a meeting in order to avoid incorrectly recognis- 
ing meetings when people simply walk close to each other. 

In the original annotation of the CAVIAR dataset the 
short-term behaviours of people fighting were sometimes 
classified as active. Consequently, in these cases LTBR 
incorrectly recognised the meeting behaviour. The intro- 
duction of the abrupt motion short-term behaviour in the 
CAVIAR dataset reduced the number of FP concerning 
meeting — the short-term behaviours of people fighting 
are now classified as abrupt motion thus not initiating a 
meeting. 

LTBR's recognition accuracy with respect to fighting 
was perfect, that is, there were no FP or FN. The previ- 
ous version of LTBR had 8 FP and 2 FN regarding fighting. 



The increased accuracy is due to the introduction of abrupt 
motion in the CAVIAR dataset and the corresponding mod- 
ification of the fighting definition in LTBR — now only 
abrupt motion initiates fighting whereas before active and 
running short-term behaviours initiated fighting. In this 
version of LTBR there is no confusion between fighting and 
meeting — consequently, there are no FP now for fighting 
when a meeting takes place. Moreover, we now avoid the 
FN concerning fighting that were due to the fact that the 
short-term behaviours of people fighting were (sometimes) 
classified as walking — these behaviours are now classified 
as abrupt motion. 

Overall the introduction of abrupt motion in the dataset 
and the corresponding modification of the long-term be- 
haviour definitions substantially increased the recognition 
accuracy of LTBR. Behaviours that could be, and were 
confused in the past, such as fighting and meeting, and 
fighting and moving, are no longer confused. The intro- 
duction of abrupt motion, however, could, in some cases, 
reduce the recognition accuracy of LTBR. This would hap- 
pen when a person moved abruptly (say fainted and fell on 
the floor) while being close to another person that was not 
inactive at the time. In this case LTBR would incorrectly 
recognise fighting. Such a combination of short-term be- 
haviours does not take place in the CAVIAR dataset. 

6 Related Work 

A well-known system for behaviour recognition is the 
Chronicle Recognition System (CRSfl A ' chronicle' can 
be seen as a long-term behaviour. In order to compare 
LTBR with CRS, we expressed the long-term behaviour 
definitions presented in this paper in the CRS input lan- 
guage. Below, for instance, is a fragment of the immobile 
definition in the CRS language. An event (e, T) pred- 
icate expresses the occurrence of an event e at time-point 
T, occurs (n, m, e, (Tl, T2)) expresses that event e 
should take place at least n times and at most m times in the 
interval [Tl, T2), noevent (e, (Tl, T2) ) expresses 
that event e should not take place in the interval [t 1 , T2 ), 
while hold { f : V, (Tl, T2) ) expresses that the value of 
fluent f should be v during the interval [t 1 , T 2 ) . ? is the 
prefix of an atemporal variable. Details about the CRS lan- 
guage and CRS in general may be found on the web page 
of the system and in Q |5] |6l . 

(1) chronicle immobilelnitiate ( ) { 

(2) occurs { 54, 54, 

St [inactive, ?id, ?x, ?y] , 
(Tl, Tl+54) ) 

(3) noevent { st [inactive, ?id, *,*] , 

(Tl-l, Tl) ) 

"http ://crs.elibel. tm . f r / 



(4) hold( shop [shopl, ?sX, ?sY] :true, 

(Tl, Tl+1) ) 

(5) ?x != ?sX or ?y != ?sY 

(6) event ( st [active, ?id, *,*] , TO ) 

(7) TO < Tl 

(8) noevent ( st [active, ?id, *,*] , 

(TO+1, Tl) ) 

(9) when recognized { 

(10) emit event ( initiatelmmobile [ ?id] , 

Tl ) 

(11) } 

(12) } 

st[?sbeh, ?id, ?x, ?y] denotes that ? id has 
the short-term behaviour ?sbeh and her coordinates are 
(?x, ?y). shop[?s, ?x, ?y] denotes the coordi- 
nates (?x, ?y) of shop ?s. The above fragment of the 
immobile definition expresses the conditions in which 
immobile is initiated. In LTBR these conditions are ex- 
pressed by axiom (HJ. In brief, an immobile (?id) be- 
haviour is recognised if ? id stays inactive for more than 54 
frames (see lines 2-3), and has been active some time in the 
past (see lines 6-8). If the conditions of the above chronicle 
are satisfied, an event is triggered, denoting that immob lie 
is initiated (see lines 9-1 1). The triggered event is used by 
another chronicle computing the duration of an immobile 
behaviour 

Recall that there was an additional constraint to the 
recognition of immobile: the person in question must be 
'far' from all shops — if the person is close to a shop then 
she would have to stay inactive much longer than 54 frames 
before immobile could be recognised. This constraint 
cannot be expressed in the CRS language for two reasons. 
First, the CRS language does not allow for the computa- 
tion of the distance between two people/objects. This is due 
to the fact that no mathematical operators are allowed in 
the constraints of atemporal variables. Our formalisation of 
immobile in the CRS language requires that the x coordi- 
nate or the y coordinate of the person in question is differ- 
ent from the respective coordinate (sX or s Y) of a shop (see 
line 5). Clearly, this is an inappropriate formalisation of the 
constraint that a person is far from a shop. Second, it is not 
possible to express that a person is 'far' from all shops due 
to the limitations of the CRS language concerning universal 
quantification. 

CRS, therefore, cannot be directly used for behaviour 
recognition in video surveillance applications. Moreover, 
CRS cannot be directly used for behaviour recognition in 
any application requiring any form of spatial reasoning, or 
any other type of atemporal reasoning. These limitations 
could be overcome by developing a separate tool for atem- 
poral reasoning that would be used by CRS whenever this 
form of reasoning was required. To the best of our knowl- 
edge, such extensions of CRS are not available. (Clearly, 



the computational efficiency of CRS, which is one of the 
main attractions of using this system for behaviour recogni- 
tion, would be compromised by the integration of an atem- 
poral reasoner.) 

In our approach to behaviour recognition, the availabil- 
ity of the full power of logic programming, which is one of 
the main attractions of employing the Event Calculus (EC) 
as the temporal formalism, allows for the development of 
behaviour definitions including complex temporal (EC is at 
least as expressive as the CRS language with respect to tem- 
poral representation) and atemporal constraints. 

We do not present a comparison of the computational 
performance of LTBR and CRS — such a comparison is 
meaningless due to the fact the behaviour definitions of 
LTBR are more complex (they include representations of 
the atemporal aspects of the definitions) than those of CRS. 

Our approach to behaviour recognition has a formal, 
declarative semantics. This is in contrast to other be- 
haviour recognition systems proposed in the literature dfTTl 
[TSl [l7l). The employed version of EC allows for the de- 
velopment of a recognition system capable of dealing with, 
among other things, durative (short-term and long-term) be- 
haviours, temporally overlapping, repetitive, and 'forbid- 
den' behaviours, that is, behaviours that should not take 
place within a specified time-period in order to recognise 
some other behaviour When necessary, more expressive 
EC versions may be employed (see lfT6l [121 for presenta- 
tions of the EC expressiveness). 

Paschke and colleagues have proposed a logic program- 
ming implementation of an EC version for behaviour recog- 
nition — see, for example, llT4l ITSl 1131 . Unlike our EC 
version, there is no support for multi-valued fluents — only 
Boolean fluents are considered. The use of multi-valued flu- 
ents makes the representation considerably more succinct. 
Moreover, the EC version of Paschke and colleagues does 
not include axioms for recognising a behaviour that has 
been initiated at some earlier time-point and has not (yet) 
terminated. For example, there is no built-in support for 
recognising an on-going fighting behaviour. Our treatment 
of behaviours as EC fluents in combination with the holds- 
For predicate for computing the intervals in which a fluent 
holds, allows us to overcome the above limitation. In the 
case of an on-going fighting behaviour, for instance, an an- 
swer to a query regarding fighting would be of the form 
since(T), indicating that the recognition of fighting started 
at T and has not (yet) ended. 

7 Summary and Further Work 

We presented a system for long-term behaviour recog- 
nition using a symbolic representation of short-term be- 
haviours detected on surveillance videos. Our intuition 
that an explicit representation of a short-term behaviour for 



abrupt motion, as opposed to implicitly representing abrupt 
motion by means of the active or walking behaviours, would 
significantly improve the recognition of a class of long-term 
behaviours, was experimentally verified. 

Our behaviour recognition system, LTBR, falls under the 
category of symbolic scenario recognition systems. LTBR 
is a logic programming implementation of the Event Calcu- 
lus (EC). We showed that a logic programming approach is 
better suited to a class of recognition applications, includ- 
ing video surveillance, rather than purely temporal reason- 
ing systems. Moreover, we showed the advantages of our 
approach with respect to a state-of-the-art recognition sys- 
tem based on a logic programming implementation of an 
EC dialect. 

A logic programming approach to behaviour recognition 
has the advantage that machine learning techniques can be 
directly employed in order to adapt a knowledge base of 
behaviour definitions with the aim of improving behaviour 
recognition accuracy. An area of current work is the use 
of inductive logic programming (ILP) techniques for fine- 
tuning in an automated way behaviour definitions (see, for 
example, ||T| for an application of ILP techniques on EC 
formalisations). 

Another direction of current research concerns the ex- 
perimental validation of our approach to behaviour recogni- 
tion in the fields of emergency rescue operations and pub- 
lic transport services. In the context of the EU-project 
PRONTO we will define and recognise long-term be- 
haviours that take place in the aforementioned application 
domains with the aim of supporting intelligent resource 
management. 
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